cost model
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Information Technology (0.67)
- Banking & Finance (0.45)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Africa > Mali (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.37)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.32)
- Africa > Mali (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Common Concerns 2 Novelty
We thank all reviewers for their valuable comments. We address the concerns raised by them below. The idea of using imitation learning to make approximate decisions is not new. The author needs to provide a wall-clock time cost comparison of different methods. We will include them in the final verision.
Gradient-Based Join Ordering
Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query execution critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they often suffer from high computational complexity and limited scalability. We show that, when the cost model is differentiable, the query plans can be continuously relaxed into a soft adjacency matrix representing a superposition of plans. This continuous relaxation, together with a Gumbel-Softmax parameterization of the adjacency matrix and differentiable constraints enforcing plan validity, enables gradient-based search for plans within this relaxed space. Using a learned Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete local search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales linearly with query size, in contrast to quadratic or exponential runtimes of classical approaches. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > France (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
ParaDySe: A Parallel-Strategy Switching Framework for Dynamic Sequence Lengths in Transformer
Ou, Zhixin, Liang, Peng, Han, Jianchen, Liu, Baihui, Qiao, Linbo
Dynamic sequences with varying lengths have been widely used in the training of Transformer-based large language models (LLMs). However, current training frameworks adopt a pre-defined static parallel strategy for these sequences, causing neither communication-parallelization cancellation on short sequences nor out-of-memory on long sequences. To mitigate these issues, we propose ParaDySe, a novel adaptive Parallel strategy switching framework for Dynamic Sequences. ParaDySe enables on-the-fly optimal strategy adoption according to the immediate input sequence. It first implements the modular function libraries for parallel strategies with unified tensor layout specifications, and then builds sequence-aware memory and time cost models with hybrid methods. Guided by cost models, ParaDySe selects optimal layer-wise strategies for dynamic sequences via an efficient heuristic algorithm. By integrating these techniques together, ParaDySe achieves seamless hot-switching of optimal strategies through its well-designed function libraries. We compare ParaDySe with baselines on representative LLMs under datasets with sequence lengths up to 624K. Experimental results indicate that ParaDySe addresses OOM and CPC bottlenecks in LLM training by systematically integrating long-sequence optimizations with existing frameworks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > China (0.04)